Reducing false positives in molecular pattern recognition.

نویسندگان

  • Xijin Ge
  • Shuichi Tsutsumi
  • Hiroyuki Aburatani
  • Shuichi Iwata
چکیده

In the search for new cancer subtypes by gene expression profiling, it is essential to avoid misclassifying samples of unknown subtypes as known ones. In this paper, we evaluated the false positive error rates of several classification algorithms through a 'null test' by presenting classifiers a large collection of independent samples that do not belong to any of the tumor types in the training dataset. The benchmark dataset is available at www2.genome.rcast.u-tokyo.ac.jp/pm/. We found that k-nearest neighbor (KNN) and support vector machine (SVM) have very high false positive error rates when fewer genes (<100) are used in prediction. The error rate can be partially reduced by including more genes. On the other hand, prototype matching (PM) method has a much lower false positive error rate. Such robustness can be achieved without loss of sensitivity by introducing suitable measures of prediction confidence. We also proposed a cluster-and-select technique to select genes for classification. The nonparametric Kruskal-Wallis H test is employed to select genes differentially expressed in multiple tumor types. To reduce the redundancy, we then divided these genes into clusters with similar expression patterns and selected a given number of genes from each cluster. The reliability of the new algorithm is tested on three public datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detection of burial mounds in high-resolution satellite images of agricultural land

Many archaeological sites are discovered during building and road construction work, prompting full excavations and delay in construction. In order to detect more cultural heritage sites in advance of construction work, the Norwegian Directorate for Cultural Heritage has taken an initiative to develop tools for early detection of potential cultural heritage sites in satellite images. The presen...

متن کامل

Keshmesh: Bringing Advanced Static Analysis to Concurrency Bug Pattern Detectors

Bug patterns are coding idioms that may make the code less maintainable or turn into bugs in future. The state-of-the-art tools for detecting concurrency bug patterns (CBPs) perform simple, intraprocedural analyses. While this simplicity makes the analysis fast, it does not provide protection against CBPs that involve aliasing or multiple methods. This paper introduces a practical and extensibl...

متن کامل

Region-based Mixture of Gaussians modelling for foreground detection in dynamic scenes

One of the most widely used techniques in computer vision for foreground detection is to model each background pixel as a Mixture of Gaussians (MoG). While this is effective for a static camera with a fixed or a slowly varying background, it fails to handle any fast, dynamic movement in the background. In this paper, we propose a generalised framework, called regionbased MoG (RMoG), that takes ...

متن کامل

Reduction of false positives in structure-based virtual screening when receptor plasticity is considered.

Structure-based virtual screening for selecting potential drug candidates is usually challenged by how numerous false positives in a molecule library are excluded when receptor plasticity is considered. In this study, based on the binding energy landscape theory, a hypothesis that a true inhibitor can bind to different conformations of the binding site favorably was put forth, and related strat...

متن کامل

Imaging in gynaecology: How good are we in identifying endometriomas?

AIM To evaluate the performance of subjective evaluation of ultrasound findings (pattern recognition) to discriminate endometriomas from other types of adnexal masses and to compare the demographic and ultrasound characteristics of the true positive cases with those cases that were presumed to be an endometrioma but proved to have a different -histology (false positive cases) and the endometrio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genome informatics. International Conference on Genome Informatics

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2003